{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "3.6 softmax回归地从零开始实现.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "TPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/chongzicbo/Dive-into-Deep-Learning-tf.keras/blob/master/3.6.%20softmax%E5%9B%9E%E5%BD%92%E5%9C%B0%E4%BB%8E%E9%9B%B6%E5%BC%80%E5%A7%8B%E5%AE%9E%E7%8E%B0.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nKQkMsC17BM2",
        "colab_type": "text"
      },
      "source": [
        "##3.6 softmax回归地从零开始实现\n",
        "&emsp;&emsp;这一节我们来动手实现softmax回归。首先导入本节实现所需的包或者模块"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "IhnQkZ6z7KZ3",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "%matplotlib inline\n",
        "import tensorflow as tf\n",
        "import matplotlib.pyplot as plt\n",
        "from IPython import  display\n",
        "from tensorflow import keras\n",
        "import tensorflow.data as tfdata"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "QecRUcI67snh",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "tf.enable_eager_execution()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4sJF6r0HHLSq",
        "colab_type": "text"
      },
      "source": [
        "###3.6.1. 获取和读取数据\n",
        "我们将使用Fashion-MNIST数据集，并设置批量大小为256。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "6TfTy4YXLs4D",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "batch_size=256\n",
        "buffer_size=10000\n",
        "def load_data_fashion_mnist(batch_size,buffer_size):\n",
        "  (x_train,y_train),(x_test,y_test)=keras.datasets.fashion_mnist.load_data()\n",
        "  train_iter=tfdata.Dataset.from_tensor_slices((x_train,y_train)).map(lambda x,y:(x/255,y)).shuffle(buffer_size).batch(batch_size)\n",
        "  test_iter=tfdata.Dataset.from_tensor_slices((x_test,y_test)).map(lambda x,y:(x/255,y)).batch(batch_size)\n",
        "  return train_iter,test_iter"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "9v_Xt893MNQB",
        "colab_type": "code",
        "outputId": "e99047d5-1d8a-41d1-fda1-3a320208b0cd",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 151
        }
      },
      "source": [
        "train_iter,test_iter=load_data_fashion_mnist(batch_size=batch_size,buffer_size=buffer_size)"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz\n",
            "32768/29515 [=================================] - 0s 0us/step\n",
            "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz\n",
            "26427392/26421880 [==============================] - 0s 0us/step\n",
            "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz\n",
            "8192/5148 [===============================================] - 0s 0us/step\n",
            "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz\n",
            "4423680/4422102 [==============================] - 0s 0us/step\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "65t_F46THQ7Y",
        "colab_type": "text"
      },
      "source": [
        "###3.6.2. 初始化模型参数\n",
        "跟线性回归中的例子一样，我们将使用向量表示每个样本。已知每个样本输入是高和宽均为28像素的图像。模型的输入向量的长度是 28×28=784 ：该向量的每个元素对应图像中每个像素。由于图像有10个类别，单层神经网络输出层的输出个数为10，因此softmax回归的权重和偏差参数分别为 784×10 和 1×10 的矩阵。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "a0apJwtEMbBp",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "num_inputs=784\n",
        "num_outputs=10\n",
        "\n",
        "W=tf.Variable(tf.random.normal(shape=(num_inputs,num_outputs)),trainable=True)\n",
        "b=tf.Variable(tf.zeros(num_outputs),trainable=True)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Nv0idMSeHqkR",
        "colab_type": "text"
      },
      "source": [
        "###3.6.3. 实现softmax运算\n",
        "在介绍如何定义softmax回归之前，我们先描述一下对如何对多维张量按维度操作。在下面的例子中，给定一个矩阵X。我们可以只对其中同一列（axis=0）或同一行（axis=1）的元素求和，并在结果中保留行和列这两个维度（keepdims=True）。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "UFPxVifTMbMY",
        "colab_type": "code",
        "outputId": "8940fb9b-5663-4909-dcca-e32eb1721d25",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 84
        }
      },
      "source": [
        "X=tf.constant([[1,2,3],[4,5,6]])\n",
        "tf.reduce_sum(input_tensor=X,axis=0,keep_dims=True),tf.reduce_sum(input_tensor=X,axis=1,keep_dims=True)"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "(<tf.Tensor: id=972542, shape=(1, 3), dtype=int32, numpy=array([[5, 7, 9]], dtype=int32)>,\n",
              " <tf.Tensor: id=972544, shape=(2, 1), dtype=int32, numpy=\n",
              " array([[ 6],\n",
              "        [15]], dtype=int32)>)"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 47
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TF7AJbZgHW6z",
        "colab_type": "text"
      },
      "source": [
        "下面我们就可以定义前面小节里介绍的softmax运算了。在下面的函数中，矩阵X的行数是样本数，列数是输出个数。为了表达样本预测各个输出的概率，softmax运算会先通过exp函数对每个元素做指数运算，再对exp矩阵同行元素求和，最后令矩阵每行各元素与该行元素之和相除。这样一来，最终得到的矩阵每行元素和为1且非负。因此，该矩阵每行都是合法的概率分布。softmax运算的输出矩阵中的任意一行元素代表了一个样本在各个输出类别上的预测概率。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "GUuroZtgNNhb",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def softmax(X):\n",
        "  X_exp=tf.exp(X)\n",
        "  partition=tf.reduce_sum(X_exp,axis=1,keep_dims=True)\n",
        "  return X_exp/partition"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mWG90zJKH4sx",
        "colab_type": "text"
      },
      "source": [
        "可以看到，对于随机输入，我们将每个元素变成了非负数，且每一行和为1。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "39jtmCL-NNkZ",
        "colab_type": "code",
        "outputId": "7a7b86f3-6a2d-4085-fd9e-28ee72e1ae8a",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 101
        }
      },
      "source": [
        "X=tf.random.normal(shape=(2,5))\n",
        "X_prob=softmax(X)\n",
        "X_prob,tf.reduce_sum(X_prob,axis=1)"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "(<tf.Tensor: id=972554, shape=(2, 5), dtype=float32, numpy=\n",
              " array([[0.07499056, 0.03552928, 0.16765563, 0.7022609 , 0.01956362],\n",
              "        [0.02011905, 0.6963503 , 0.12581605, 0.09248032, 0.06523434]],\n",
              "       dtype=float32)>,\n",
              " <tf.Tensor: id=972556, shape=(2,), dtype=float32, numpy=array([1., 1.], dtype=float32)>)"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 49
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2PdPzvvUH7uW",
        "colab_type": "text"
      },
      "source": [
        "###3.6.4. 定义模型\n",
        "有了softmax运算，我们可以定义上节描述的softmax回归模型了。这里通过reshape函数将每张原始图像改成长度为num_inputs的向量。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "UpGTvCZiN7Vd",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def net(X):\n",
        "  return softmax(tf.matmul(tf.reshape(X,(-1,num_inputs)),W)+b)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YBMxQbmJIB66",
        "colab_type": "text"
      },
      "source": [
        "###3.6.5. 定义损失函数\n",
        "上一节中，我们介绍了softmax回归使用的交叉熵损失函数。在下面的例子中，变量y_hat是2个样本在3个类别的预测概率，变量y是这2个样本的标签类别，通过计算得到了2个样本的标签的预测概率。与“softmax回归”一节数学表述中标签类别离散值从1开始逐一递增不同，在代码中，标签类别的离散值是从0开始逐一递增的。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "sK3oYrbLOP9G",
        "colab_type": "code",
        "outputId": "0c943d77-c672-4867-b676-f1babf28e728",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 34
        }
      },
      "source": [
        "y_hat=tf.constant([[0.1,0.3,0.6],[0.3,0.2,0.5]])\n",
        "y=tf.constant([0,2])\n",
        "tf.reduce_max(tf.one_hot(y,depth=tf.shape(y_hat)[-1])*y_hat,axis=1)"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<tf.Tensor: id=972569, shape=(2,), dtype=float32, numpy=array([0.1, 0.5], dtype=float32)>"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 51
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AbL95UUFIVUR",
        "colab_type": "text"
      },
      "source": [
        "下面实现了“softmax回归”一节中介绍的交叉熵损失函数。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "CPD2AOTiO4Wb",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def cross_entropy(y_hat,y):\n",
        "  return -tf.log(tf.reduce_max(tf.one_hot(y,depth=tf.shape(y_hat)[-1])*y_hat,axis=1))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Jk_UeFzDIYih",
        "colab_type": "text"
      },
      "source": [
        "###3.6.6. 计算分类准确率\n",
        "给定一个类别的预测概率分布y_hat，我们把预测概率最大的类别作为输出类别。如果它与真实类别y一致，说明这次预测是正确的。分类准确率即正确预测数量与总预测数量之比。\n",
        "\n",
        "为了演示准确率的计算，下面定义准确率accuracy函数。其中tf.argmax(axis=1)返回矩阵y_hat每行中最大元素的索引，且返回结果与变量y形状相同。由于标签类型为整数，我们先将变量y变换为浮点数再进行相等条件判断。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "WEir4Cl-YHoQ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def accuracy(y_hat,y):\n",
        "  return tf.equal(tf.cast(tf.argmax(y_hat,axis=1),tf.float32),tf.cast(y,tf.float32)).numpy().mean()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ajaqnrL5ZHKz",
        "colab_type": "code",
        "outputId": "52ff3e63-d6d5-4780-cde1-259455b20715",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 34
        }
      },
      "source": [
        "tf.equal(tf.cast(tf.argmax(y_hat,axis=1),tf.float32),tf.cast(y,tf.float32)).numpy().mean()"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0.5"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 54
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "xvGxp9RkYr6i",
        "colab_type": "code",
        "outputId": "47cf3725-8b72-496e-d7e4-0713b33ea0ec",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 34
        }
      },
      "source": [
        "accuracy(y_hat,y)"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0.5"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 55
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e2DXNuGXIoBJ",
        "colab_type": "text"
      },
      "source": [
        "类似地，我们可以评价模型net在数据集data_iter上的准确率。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "B_Z_Bg_QY0bl",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def evaluate_accuracy(data_iter,net):\n",
        "  acc_sum,n=0.0,0\n",
        "  for X,y in data_iter:\n",
        "    # y=tf.cast(y,tf.float32)\n",
        "    acc_sum+=tf.equal(tf.cast(tf.argmax(net(X),axis=1),tf.float32),tf.cast(y,tf.float32)).numpy().sum()\n",
        "    n+=tf.shape(y)[0].numpy()\n",
        "  return acc_sum/n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "mreWBbgburCD",
        "colab_type": "code",
        "outputId": "6d22b71a-3654-4051-c65f-ff6fa5eb8912",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 34
        }
      },
      "source": [
        "evaluate_accuracy(test_iter,net)"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0.1189"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 57
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YoEjbaQ9IrUN",
        "colab_type": "text"
      },
      "source": [
        "###3.6.7. 训练模型\n",
        "训练softmax回归的实现跟“线性回归的从零开始实现”一节介绍的线性回归中的实现非常相似。我们同样使用小批量随机梯度下降来优化模型的损失函数。在训练模型时，迭代周期数num_epochs和学习率lr都是可以调的超参数。改变它们的值可能会得到分类更准确的模型。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "4HI9Ahi0vHw1",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "num_epochs,lr=5,0.1\n",
        "def sgd(params,loss,t,lr,batch_size):\n",
        "  for param in params:\n",
        "    dl_dp=t.gradient(loss,param) #求梯度\n",
        "    param.assign_sub(lr*dl_dp/batch_size) #更新梯度\n",
        "\n",
        "def train_ch3(net,train_iter,test_iter,loss,num_epochs,batch_size,params=None,lr=None,trainer=None):\n",
        "  for epoch in range(num_epochs):\n",
        "    train_l_sum,train_acc_sum,n=0.0,0.0,0\n",
        "    for X,y in train_iter:\n",
        "      if trainer is None:\n",
        "        with tf.GradientTape(persistent=True) as t:\n",
        "          y_hat=net(X)\n",
        "          l=tf.reduce_sum(loss(y_hat,y))\n",
        "        sgd(params,l,t,lr,batch_size)\n",
        "      else:\n",
        "        # print('更新梯度')\n",
        "        y_hat=net(X)\n",
        "        l=tf.reduce_sum(loss(y_hat,y))\n",
        "        trainer.minimize(lambda:loss(net(X),y),global_step=tf.train.get_or_create_global_step())\n",
        "        # print('梯度更新完毕')\n",
        "      train_l_sum+=l.numpy()\n",
        "      train_acc_sum+=tf.equal(tf.cast(tf.argmax(y_hat,axis=1),tf.float32),tf.cast(y,tf.float32)).numpy().sum()\n",
        "      n+=tf.shape(y)[0]\n",
        "    test_acc=evaluate_accuracy(test_iter,net)\n",
        "    print('epoch %d,loss %.4f,train acc %.3f,test acc %.3f'%(epoch+1,train_l_sum/n,train_acc_sum/n,test_acc))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "i8GYFYtFux1L",
        "colab_type": "code",
        "outputId": "d07b6255-1ee9-435d-8517-a3f82336b2c3",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 101
        }
      },
      "source": [
        "train_ch3(net,train_iter,test_iter,cross_entropy,num_epochs,batch_size,[W,b],lr)"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "epoch 1,loss 0.7447,train acc 0.804,test acc 0.786\n",
            "epoch 2,loss 0.7359,train acc 0.805,test acc 0.787\n",
            "epoch 3,loss 0.7278,train acc 0.806,test acc 0.788\n",
            "epoch 4,loss 0.7201,train acc 0.807,test acc 0.788\n",
            "epoch 5,loss 0.7129,train acc 0.808,test acc 0.788\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "KJ_k5U1V7pGg",
        "colab_type": "code",
        "outputId": "50e9743d-e152-4ce5-cbe5-c438bbee6ee2",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 185
        }
      },
      "source": [
        "optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.001)\n",
        "train_ch3(net,train_iter,test_iter,cross_entropy,num_epochs=10,batch_size=batch_size,params=[W,b],lr=None,trainer=optimizer)"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "epoch 1,loss 2.5528,train acc 0.570,test acc 0.668\n",
            "epoch 2,loss 1.3727,train acc 0.708,test acc 0.720\n",
            "epoch 3,loss 1.1619,train acc 0.741,test acc 0.743\n",
            "epoch 4,loss 1.0501,train acc 0.757,test acc 0.756\n",
            "epoch 5,loss 0.9763,train acc 0.769,test acc 0.766\n",
            "epoch 6,loss 0.9217,train acc 0.777,test acc 0.771\n",
            "epoch 7,loss 0.8787,train acc 0.783,test acc 0.776\n",
            "epoch 8,loss 0.8435,train acc 0.789,test acc 0.778\n",
            "epoch 9,loss 0.8139,train acc 0.793,test acc 0.780\n",
            "epoch 10,loss 0.7885,train acc 0.797,test acc 0.782\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "oXSQA8J8ygI6",
        "colab_type": "code",
        "outputId": "48d4dd85-644c-48ca-ab08-0607b7119965",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 35
        }
      },
      "source": [
        "for X,y in train_iter:\n",
        "  out=net(X)\n",
        "  print(out.shape)\n",
        "  break"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "(256, 10)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NvfcckU1z4eR",
        "colab_type": "text"
      },
      "source": [
        "###3.6.8. 小结\n",
        "可以使用softmax回归做多类别分类。与训练线性回归相比，你会发现训练softmax回归的步骤和它非常相似：获取并读取数据、定义模型和损失函数并使用优化算法训练模型。事实上，绝大多数深度学习模型的训练都有着类似的步骤。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "McE8jIAgI5ll",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        ""
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}